Parallelizing Compilation through Load-Time Scheduling for a Superscalar Processor Family

نویسندگان

  • Michael Hußmann
  • Michael Thies
  • Uwe Kastens
چکیده

Superscalar processors improve the execution time of sequential programs by exploiting instruction-level parallelism (ILP). The efficiency of parallelization at run-time can be increased through an additional scheduling phase for a concrete target machine in the compiler. But if the target machine is not known at compile-time, scheduling must be deferred to a later phase immediately before program execution. In this paper we present a novel technique, which prepares parallelization at compile-time and performs scheduling at load-time of a program. Our approach called CALS (Code Annotations for Load-time Scheduling) uses proof-carrying code techniques for scheduling in linear time by using a new algorithm. Additionally, the closely related task of register allocation is split between compile-time and load-time of a program. CALS achieves improvements of up to 23.8% over simple compilation without scheduling. It obtains results comparable to conventional list scheduling or even outperforms it by up to 12.4%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SALT: Efficient Load-Time Scheduling for Superscalar Processor Families Using Compiler Annotations

Superscalar processors exploit instruction-level parallelism (ILP) by dispatching machine instructions to several functional units where they are executed in parallel. The efficiency of parallelization at run-time can be increased through an additional scheduling phase for a concrete target machine in the compiler. But if the mobile code should be executed in a heterogenous network with process...

متن کامل

Effective Instruction Prefetching In Chip Multiprocessors

threaded application performance, often achieved through instruction level parallelism per chip is increasing, the software and hardware techniques to exploit the potential of studies mostly involve distributed shared memory multiprocessors and fetching will not be fully effective at masking the remote fetch latency. the effective address of the load instructions along that path based upon a hi...

متن کامل

Compilation Support for Superscalar Processors

This thesis describes work done in two areas of compilation support for superscalar processors; register allocation and instruction scheduling. Chapter 1 describes an approach to register allocation for superscalar processors that supports dynamic and speculative out-of-order execution of instructions and guarantees precise interrupts without expensive hardware for managing register usage and m...

متن کامل

Inter-block Scoreboard Scheduling in a JIT Compiler for VLIW Processors

We present a postpass instruction scheduling technique suitable for Just-In-Time (JIT) compilers targeted to VLIW processors. Its key features are: reduced compilation time and memory requirements; satisfaction of scheduling constraints along all program paths; and the ability to preserve existing prepass schedules, including software pipelines. This is achieved by combining two ideas: instruct...

متن کامل

Integrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading

As the number of transistors on a single chip continues to grow, it is important to think beyond the traditional approaches of compiler optimizations for deeper pipelines and wider instruction issue units to improve performance. This single-threaded execution model limits these approaches to exploiting only the relatively small amount of instruction-level parallelism available in application pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005